Deep learning has been widely used in the perception (e.g., 3D object detection) of intelligent vehicle driving. Due to the beneficial Vehicle-to-Vehicle (V2V) communication, the deep learning based features from other agents can be shared to the ego vehicle so as to improve the perception of the ego vehicle. It is named as Cooperative Perception in the V2V research, whose algorithms have been dramatically advanced recently. However, all the existing cooperative perception algorithms assume the ideal V2V communication without considering the possible lossy shared features because of the Lossy Communication (LC) which is common in the complex real-world driving scenarios. In this paper, we first study the side effect (e.g., detection performance drop) by the lossy communication in the V2V Cooperative Perception, and then we propose a novel intermediate LC-aware feature fusion method to relieve the side effect of lossy communication by a LC-aware Repair Network (LCRN) and enhance the interaction between the ego vehicle and other vehicles by a specially designed V2V Attention Module (V2VAM) including intra-vehicle attention of ego vehicle and uncertainty-aware inter-vehicle attention. The extensive experiment on the public cooperative perception dataset OPV2V (based on digital-twin CARLA simulator) demonstrates that the proposed method is quite effective for the cooperative point cloud based 3D object detection under lossy V2V communication.
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
Understanding objects is a central building block of artificial intelligence, especially for embodied AI. Even though object recognition excels with deep learning, current machines still struggle to learn higher-level knowledge, e.g., what attributes an object has, and what can we do with an object. In this work, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. It requires machines to reason out object affordances and simultaneously give the reason: what attributes make an object possesses these affordances. To support OCL, we build a densely annotated knowledge base including extensive labels for three levels of object concept (category, attribute, affordance), and the causal relations of three levels. By analyzing the causal structure of OCL, we present a baseline, Object Concept Reasoning Network (OCRN). It leverages causal intervention and concept instantiation to infer the three levels following their causal relations. In experiments, OCRN effectively infers the object knowledge while following the causalities well. Our data and code are available at https://mvig-rhos.com/ocl.
translated by 谷歌翻译
Recently, Vehicle-to-Everything(V2X) cooperative perception has attracted increasing attention. Infrastructure sensors play a critical role in this research field, however, how to find the optimal placement of infrastructure sensors is rarely studied. In this paper, we investigate the problem of infrastructure sensor placement and propose a pipeline that can efficiently and effectively find optimal installation positions for infrastructure sensors in a realistic simulated environment. To better simulate and evaluate LiDAR placement, we establish a Realistic LiDAR Simulation library that can simulate the unique characteristics of different popular LiDARs and produce high-fidelity LiDAR point clouds in the CARLA simulator. Through simulating point cloud data in different LiDAR placements, we can evaluate the perception accuracy of these placements using multiple detection models. Then, we analyze the correlation between the point cloud distribution and perception accuracy by calculating the density and uniformity of regions of interest. Experiments show that the placement of infrastructure LiDAR can heavily affect the accuracy of perception. We also analyze the correlation between perception performance in the region of interest and LiDAR point cloud distribution and validate that density and uniformity can be indicators of performance.
translated by 谷歌翻译
由于遮挡引起的严重观察,基于手动对象相互作用的单个基于手动对象相互作用的重建具有挑战性。本文提出了一种基于物理的方法,以更好地解决重建中的歧义。它首先提出了一个基于力的动力学模型,该模型不仅恢复了未观察到的触点,而且还解决了合理的接触力。接下来,提出了一种基于置信的幻灯片预防方案,该方案将运动学上的信心和接触力都结合在一起,共同模拟静态和滑动接触运动。定性和定量实验表明,该提出的技术在物理上可行,更准确的手动相互作用,并使用单个RGBD传感器实时估计可见的接触力。
translated by 谷歌翻译
神经体积表示表明,MLP网络可以通过多视图校准图像来训练MLP网络,以表示场景的几何形状和外观,而无需显式3D监督。对象分割可以根据学习的辐射字段丰富许多下游应用程序。但是,引入手工制作的细分以在复杂的现实世界中定义感兴趣的区域是非平凡且昂贵的,因为它获得了每个视图注释。本文使用NERF进行复杂的现实世界场景来探索对物体分割的自我监督学习。我们的框架,nerf-sos,夫妻对象分割和神经辐射字段,以在场景中的任何视图中分割对象。通过提出一种新颖的合作对比度损失,在外观和几何水平上,NERF-SOS鼓励NERF模型将紧凑的几何学分割簇从其密度字段中提炼出紧凑的几何学分割簇以及自我监督的预训练的预训练的2D视觉特征。可以将自我监督的对象分割框架应用于各种NERF模型,这些模型既可以导致室内和室外场景的照片真实的渲染结果和令人信服的分割。 LLFF,坦克和寺庙数据集的广泛结果验证了NERF-SOS的有效性。它始终超过其他基于图像的自我监督基线,甚至比监督的语义nerf捕捉细节。
translated by 谷歌翻译
感谢您的跨模式检索技术,通过将它们投射到一个共同的空间中,可以在24小时的监视系统中重新进行重新识别,从而实现了可见的信号(RGB-IR)重新识别(RE-ID)。但是,关于探测到探测器,几乎所有现有的基于RGB-IR的跨模式人RE-ID方法都集中在图像到图像匹配上,而视频对视频匹配包含更丰富的空间 - 和时间信息仍未探索。在本文中,我们主要研究基于视频的跨模式人Re-ID方法。为了实现这项任务,构建了一个基于视频的RGB-IR数据集,其中927个有效身份,具有463,259帧和21,863个曲目,由12个RGB/IR摄像机捕获。基于我们构造的数据集,我们证明,随着曲目中帧的增加,该性能确实达到了更多的增强功能,证明了视频对视频匹配在RGB-IR RE-ID中的重要性。此外,进一步提出了一种新颖的方法,不仅将两种模态投射到模态不变子空间,而且还提取了运动不变的时间记忆。多亏了这两种策略,我们基于视频的跨模式人重新ID取得了更好的结果。代码和数据集以:https://github.com/vcmproject233/mitml发布。
translated by 谷歌翻译
联合学习(FL)在中央服务器的帮助下支持多个客户的全球机器学习模型的分布式培训。每个客户端持有的本地数据集从未在FL中交换,因此保护本地数据集隐私受到保护。尽管FL越来越流行,但不同客户的数据异质性导致客户模型漂移问题,并导致模型性能降级和模型公平不佳。为了解决这个问题,我们在本文中使用全球本地知识融合(FEDKF)计划设计联合学习。 FEDKF中的关键思想是让服务器返回每个训练回合中的全局知识,以与本地知识融合,以便可以将本地模型正规化为全球最佳选择。因此,可以缓解客户模型漂移问题。在FEDKF中,我们首先提出了支持精确的全球知识表示形式的主动模型聚合技术。然后,我们提出了一种无数据的知识蒸馏(KD)方法,以促进KD从全局模型到本地模型,而本地模型仍然可以同时学习本地知识(嵌入本地数据集中),从而实现了全局 - 本地知识融合过程。理论分析和密集实验表明,FEDKF同时实现高模型性能,高公平性和隐私性。纸质审查后,项目源代码将在GitHub上发布。
translated by 谷歌翻译
最近,已广泛研究了基于深度学习的方法,以进行可变形的图像注册任务。但是,大多数努力将复合图像表示形式直接映射到通过卷积神经网络的空间转换,而忽略了其捕获空间对应关系的有限能力。另一方面,变压器可以更好地表征与注意机制的空间关系,其远程依赖性可能对注册任务有害,在这种情况下,距离太大的体素不太可能是相应的对。在这项研究中,我们提出了一个新型的变形器模块,以及用于可变形图像配准任务的多尺度框架。变形器模块旨在通过将位移矢量预测作为几个碱基的加权总和来促进从图像表示到空间转换的映射。借助多尺度框架以粗略的方式预测位移字段,与传统和基于学习的方法相比,可以实现卓越的性能。进行了两个公共数据集的全面实验,以证明所提出的变形器模块以及多规模框架的有效性。
translated by 谷歌翻译
近年来,目睹了直接建立在点云上的学识渊博的代表。尽管变得越来越表现力,但大多数现有的表示仍然很难产生有序的点集。受到球形多视图扫描仪的启发,我们提出了一种称为Spotlights的新型采样模型,代表3D形状作为深度值的紧凑型1D阵列。它模拟了均匀分布在球体上的摄像机的配置,在该球体上,每个虚拟摄像机都会通过小同心球形盖上的样品点从主要点施放光线,以探测可能与球体包围的物体的相交。因此,结构化点云被隐式地作为深度的函数。我们提供了该新样本方案的详细几何分析,并在点云完成任务的背景下证明了其有效性。合成数据和真实数据的实验结果表明,我们的方法可以达到竞争精度和一致性,同时显着降低了计算成本。此外,我们在下游点云注册任务上显示出优于最新完成方法的性能。
translated by 谷歌翻译